KiaDev Intelligence

#entropy clipping24/08/2025

Prefix-RFT: Guiding LLMs with Partial Demonstrations to Merge SFT and RFT

Prefix-RFT blends supervised and reinforcement fine-tuning by using partial demonstration prefixes to guide exploration, achieving stronger and more stable performance on math reasoning benchmarks than SFT, RFT, and hybrid baselines.

READ →